GMM-Based Missing-Feature Reconstruction on Multi-Frame Windows

نویسندگان

  • Ulpu Remes
  • Yoshihiko Nankaku
  • Keiichi Tokuda
چکیده

Methods for missing-feature reconstruction substitute noisecorrupted features with clean-speech estimates calculated based on reliable information found in the noisy speech signal. Gaussian mixture model (GMM) based reconstruction has conventionally focussed on reliable information present in a single frame. In this work, GMM-based reconstruction is applied on windows that span several time frames. Mixtures of factor analysers (MFA) are used to limit the number of model parameters needed to describe the feature distribution as window width increases. Using the window-based MFA in noisy speech recognition task resulted in relative error reductions up to 52 % compared to frame-based GMM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

We propose a hierarchical approach to multi-action recognition that performs joint classification and segmentation. A given video (containing several consecutive actions) is processed via a sequence of overlapping temporal windows. Each frame in a temporal window is represented through selective lowlevel spatio-temporal features which efficiently capture relevant local dynamics. Features from e...

متن کامل

Sparse Reconstruction of Multi-Window Time-Frequency Representation Based on Hermite functions

Multi-window spectrograms offer higher energy concentration in contrast to the traditional single-window spectrograms. However, these quadratic time-frequency distributions were not introduced to deal with randomly undersampled signals. This paper applies sparse reconstruction techniques to provide time-frequency representations of nonstationary signals using the Hermite functions as multiple w...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

Scalable distributed speech recognition using Gaussian mixture model-based block quantisation

In this paper, we investigate the use of block quantisers based on Gaussian mixture models (GMMs) for the coding of Mel frequency-warped cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. Specifically, we consider the multi-frame scheme, where temporal correlation across MFCC frames is exploited by the Karhunen–Loève transform of the block quantiser. Comp...

متن کامل

Time-dependent cross-probability model for multi-environment model based LInear normalization

In a previous work, Multi-Environment Model based LInear Normalization, MEMLIN, was presented and it was proved to be effective to compensate environment mismatch. MEMLIN is an empirical feature vector normalization which models clean and noisy spaces by Gaussian Mixture Models (GMMs). In this algorithm, the probability of the clean model Gaussian, given the noisy model one and the noisy featur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011